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1. Introduction 

One way to describe a stationary spatial point processes is through some measure of dumpi- 
ness of the events of the process. A commonly used measure of dumpiness is the reduced second 
moment function K(t), defined as the expected number of events within distance t of a typical 
event of the process divided by the intensity of the process. For a homogeneous Poisson process 
on R d , K(t) = [idtf 1 , where fid is the volume of a unit ball in d dimensions. Thus, values of K(t) 
greater than are indicative of a process that is clumpier than Poisson and values less than 
Hdt d are indicative of a process that is more regular than Poisson. When estimating K(t) based 
on observing a process within a bounded window W, a central problem is that for any event in W 
that is within t of the boundary of W, we do not know for sure how many other events are within 
t of it. Baddeley (1998) describes a number of ways of accounting for these edge effects. Although 
there is quite a bit of asymptotic theory for how these estimators behave when the underlying 
process is Poisson (Ripley 1988, Stein 1993), much less is known for non-Poisson processes. 

An interesting aspect of asymptotic theory for point processes is how one should take limits. 
Ripley (1988) and Stein (1993) consider a single growing window, which might appear to be the 
obvious way to take limits. However, Baddeley, et al. (1993) describe applications in which point 
processes are observed in many well-separated windows. For this setting, Baddeley and Gill (1997) 
argue that it is natural to consider taking limits by keeping the size of these windows fixed and 
letting their number increase. As they point out, one advantage of this approach is that the edge 
effects do not become negligible in the limit, since for any fixed t, the fraction of events that are 
within t of a window boundary does not tend to 0. Thus, for comparing different approaches for 
handling edge effects, increasing the number of windows may be more informative than allowing 
a single region to grow in all dimensions, for which the fraction of events that are within t of a 
window boundary does tend to 0. Another advantage of taking limits by letting the number of 
windows increase is that if the process is independent in different regions, then limit theorems are 
easier to prove. This is particularly the case when the windows are all well-separated translations 
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of the same set so that the observations of the process on the multiple windows can be reasonably 
modeled as iid realizations. Baddeley and Gill (1997) use this approach to obtain weak convergence 
results for estimators of K and other functions describing point process behavior. The resulting 
limiting variances are difficult to evaluate and Baddeley and Gill (1997) only give explicit results 
for what they call the sparse Poisson limit, in which the intensity of a homogeneous Poisson process 
tends to 0. 

This work studies the estimation of K for a process on K when the windows are segments of 
varying lengths. The fact that the windows are one-dimensional greatly simplifies the calculation of 
estimators and permits the explicit derivation of some of their properties. The fact that the segment 
lengths vary provides for an interesting wrinkle on the approach of Baddeley and Gill (1997). 
Notably, simulation results in Section 6 show that the differences between certain estimators are 
much greater when the segment lengths are unequal. 

Section 2 describes a cosmological problem that motivated the present study. Vanden Berk, 
et al. (1996) put together a catalog of what are known as absorption-line systems, or absorbers, 
detected along the lines-of-sight of QSOs (quasi-stellar objects or quasars). This catalog, a pre- 
liminary version of which can be obtained from Daniel Vanden Berk (danvb@astro.as.utexas.edu), 
provides important evidence about the large-scale structure of the universe. To a first approxima- 
tion, in appropriate units, the locations of these absorbers along the lines-of-sight can be viewed 
as multiple realizations of a stationary point process along segments of varying length. 

Section 3 describes the estimators of K used in this paper and gives explicit expressions for the 
commonly used rigid motion correction and isotropic correction estimators when the observation 
region is a collection of line segments of varying lengths. In addition, Section 3 provides an explicit 
expression for a modification to the rigid motion correction advocated in Stein (1993). The fact 
that this estimator can be calculated explicitly is in contrast to the situation in more than one 
dimension, in which case, calculating this modified rigid motion correction requires numerous 
numerical integrations even for simple regions such as circles and rectangles. Finally, following 
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on an idea of Picka (1996), Section 3 introduces another approach to modifying the rigid motion 
correction and isotropic correction. When the underlying process is homogeneous Poisson, Picka's 
modification of the rigid motion correction has similar properties to the estimator proposed in 
Stein (1993), but theoretical results in Section 5 and simulation results in Section 6 suggest that 
his approach may have some advantages and we recommend the adoption of the resulting estimator 
for routine use. 

When the underlying process is homogeneous Poisson, Section 4 derives some asymptotic 
theory for the various estimators as the number of segments on which the process is observed in- 
creases. As in the case of a single growing observation window studied in Stein (1993), the modified 
rigid motion correction asymptotically minimizes the variance of the estimator of K(t) among a 
large class of estimators possessing a type of unbiasedness property. Furthermore, if the segments 
are of equal length, then it is possible to give explicit comparisons between various estimators. In 
particular, the ratio of the asymptotic mean squared error of the ordinary rigid motion correction 
to that of the modified rigid motion correction equals 1 plus a positive term proportional to the 
expected number of events per line segment. Thus, the benefit of the modification is modest when 
this expectation is small, around 1, say, but can be quite substantial when this expectation is large. 

Section 5 considers asymptotic results when the underlying process is not necessarily ho- 
mogeneous Poisson, the segments are all of equal length and the processes on different segments 
are independent. In this case, it is essentially trivial to obtain a central limit theorem for the 
estimators of K used here. From the general result, it is difficult to make comparisons between 
the various estimators. However, if the process on the different segments are each homogeneous 
Poisson but with intensities that vary from segment to segment according to some sequence of iid 
positive random variables, it is possible to give simple expressions for the asymptotic variances of 
the rigid motion correction and the two modifications of this estimator. These results show that 
the modification in Stein (1993) has strictly smaller asymptotic variance than the ordinary rigid 
motion correction. Furthermore, the modification of Picka (1996) has strictly smaller asymptotic 
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variance than the modification in Stein (1993) unless the random intensities have variance, in 
which case, the two modified estimators have equal asymptotic variance. 

Section 6 reports on the results of a simulation study comparing the ordinary rigid motion 
correction and the two modifications for both Poisson and non-Poisson processes, and equal and 
unequal segment lengths. While there is no theory showing the general superiority of the modified 
estimators for non-Poisson processes, the modified estimators do, for the most part, outperform 
the unmodified estimator. The advantage of the modified estimators tend to be larger when the 
process is more regular than Poisson, when the segment lengths are unequal and when t is near 
the length of the longest available segment. 

Section 7 applies the rigid motion correction and the two modifications of it described in 
Section 3 to the estimation of K for the absorber catalog. In addition, approximate confidence 
intervals are obtained using bootstrapping based on viewing the segments as the sampling units. 
All three estimates are similar and confirm the finding in Quashnock and Stein (1999) of clear 
evidence of clustering up to at least 50 hr 1 Mpc. In addition, the confidence intervals based on 
the modified procedures produce a slightly stronger case for clustering of absorbers beyond 100 
hr x Mpc. Whether there is clustering of matter at such large scales and for the high redshifts in 
the absorber catalog is a critical issue in modern cosmology, since presently used models for the 
evolution of the universe have difficulty explaining such clustering (Steidel, et al. 1998, Jing and 
Suto 1998). 

2. The absorber catalog 

The cosmological principle, which states that on large enough spatial scales, the distribution 
of matter in the universe is homogeneous and isotropic, is a central tenet of modern cosmology 
(Peebles 1993). In cosmology, it is convenient to measure distances in units of h^ 1 Mpc, where 
Mpc, or megaparsec, is 3.26 x 10 6 light years and h is an inexactly known dimensionless number 
that is believed to be between 0.5 and 0.75. As is common in the cosmological literature, in 
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reporting distances determined from redshifts, we will assume that Hubble's constant, Ho, equals 
lOO/ikms -1 Mpc -1 . To help calibrate one's thinking about such distances, 1 h^ 1 Mpc is a typical 
distance between neighboring galaxies. It is now generally agreed that galaxies cluster up to scales 
of 10-20 h~ x Mpc (Davis and Peebles 1983, Loveday, et al. 1995). Furthermore, clustering on 
such scales can be reproduced by computer simulations of the evolution of the universe based on 
our present understanding of this evolution (see Zhang, et al. 1998 and the references therein). 
However, there is some evidence of clustering of matter on scales of up to 100 h~ x Mpc (see 
Quashnock, Vanden Berk and York (1996) and the references therein) and a few cosmologists have 
speculated that clustering may exist at all spatial scales (Coleman and Pietronero 1992, Sylos 
Labini, Montuori and Pietronero 1998), despite the fact that clustering at all scales contradicts 
both the cosmological principle and the considerable evidence that supports it (Peebles 1993, p. 
20, 45 and 221). Thus, determining the extent to which clustering of matter is present is of 
fundamental importance to modern cosmology. 

One way to measure the clustering of matter is through the direct observation of large 
numbers of galaxies. Several galaxy surveys in various regions of the sky have been done in recent 
years (Martinez 1997); Pons-Borderfa, et al. (1999) describe recent work on estimating second 
moment structures of galaxy locations from such surveys. The presently ongoing Sloan Digital Sky 
Survey will be by far the largest such survey and will contain roughly 10 8 galaxies, approximately 
10 6 of which will have spectroscopically measured redshifts (Margon 1999). An object's redshift 
gives its velocity relative to the Earth, which, using Hubble's Law, yields its approximate distance 
from the Earth. Galaxy surveys are limited by the fact that galaxies are difficult to observe directly 
beyond several hundred h~ l Mpc. QSOs, on the other hand, are extremely bright and focused 
objects that can be readily detected at distances of several thousand h~ x Mpc, going back to nearly 
the beginning of the universe. Matter that falls on the line-of-sight between the QSO and the Earth 
can absorb light from the QSO and thus be detected from the Earth even though this matter cannot 
be directly observed. Certain types of matter absorb light in a characteristic pattern of frequencies 
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that can be used to identify the matter and, through the redshift of this absorption pattern, the 
relative velocity of this matter to the Earth. Astronomical objects detected in this way are called 
absorption-line systems or absorbers. As noted by Crotts, Melott, and York (1985), catalogs of 
absorbers provide a means for estimating the clustering of matter over very large spatial scales. 
Vanden Berk et al. (1996), Quashnock, Vanden Berk and York (1996) and Quashnock and Vanden 
Berk (1998) make use of an extensive catalog of heavy-element absorption-line systems drawn from 
the literature to investigate the clustering of matter at various scales. York, et al. (1991) describe 
an earlier version of this catalog and a preliminary version of an updated catalog is available from 
Daniel Vanden Berk (danvb@astro.as.utexas.edu). Here we will use the same absorber catalog as 
in Quashnock and Stein (1999), who examined clustering in 352 C IV absorbers (absorbers detected 
from the absorption-line patterns of C IV, or triply ionized carbon) along 274 QSO lines-of-sight. 
Although the relationship between C IV absorbers and galaxies is unclear, they do appear to track 
the general spatial patterns of galaxies (Lanzetta, et al. 1995, Quashnock and Vanden Berk 1998), 
and hence provide a plausible means for assessing the clustering of visible matter on large scales. 

Because the universe expands over time and, due to the finite velocity of light, the more 
distant an object the further in the past we observe it, the method used for converting redshifts 
into distances from Earth is critical to the analysis of this catalog. Redshifts are generally denoted 
by z and, according to Hubble's law, an object observed at redshift z is seen at a time when 
distances between objects were approximately (1 + z)^ 1 times their present values. To correct 
for the expansion, here, as in Quashnock and Stein (1999), we use what are called comoving 
coordinates, which scale up all distances to what they would be today if all the matter in the 
universe moved exactly with the Hubble flow (Peebles 1993). Thus, in examining the clustering of 
absorbers in comoving coordinates, we have removed the most important effects of the universe's 
expansion. If one did not make this correction, the volume density of absorbers would drop 
approximately like (1 + z) 3 as z decreases and we move towards the present. 
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For various reasons, it is only possible to detect C IV absorbers along a segment of each 
line-of-sight. The mean length of these segments in comoving units is 303.3 h^ 1 Mpc, with a 
range of 7.5 h^ 1 Mpc to 439.8 lr x Mpc. For this catalog, the median redshift of the absorbers is 
about 2.2, with the bulk of absorbers having redshifts from about 1.5 to 3. Our analysis acts as if 
clustering is both stationary in time and homogeneous in space. We are more accurately examining 
an average clustering over the range of redshifts in the sample at a cosmic epoch corresponding to 
a characteristic redshift of 2.2 (when the universe was about 1/3 its present scale and about 1/6 
its present age). Section 7 provides further discussion of this issue and its possible influence on our 
results. 

As in Quashnock and Stein (1999), we will act as if the absorber catalog can be viewed as 
multiple partial realizations of some stationary point process on M along a series of segments. In 
particular, we will not attempt to use any information about the physical location of these segments 
in three-dimensional space. Using this simplification, we will then be able to apply the methods 
described in the next section to the absorber catalog. 

3. Methodology 

Suppose Mi, . . . , M p are simple, stationary point processes on R with a common probability 
law having intensity A and reduced second moment function K. We do not necessarily assume that 
Mi, . . . , M p are independent. For a Borel subset A of R, let Mj(A) be the number of events of Mi- 
contained in A. If [0, Qj] is the interval on which we observe Mj, then we can write the observation 

p 

domain as D = U {[0, Qj],j}, so that (x,£) <G D implies £ <G {l,...,p} and x <G [0, Q^]. Define 
i=i 

Nj = Mj([0, Qj}), N + = Y^j=i Nj and denote the realized value of N + by n. For j = 1, . . . , N + , 
let (Xj,Lj) be the random locations of these observed events with realized values (xj,£j) for 
j = l,...,n. 

The basic principle behind all edge-corrected estimators of K described by Ripley (1988) 
is to first find an exactly unbiased estimator of A 2 x volume of observation domain x K(t) and 
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then to divide by an estimator of (A 2 x volume). Here, the volume of the observation domain is 
Q + = Yl P j=i Qj- For a symmetric function <j> on D x D, define T{4>) = J2j^k 4>{(^-j-> A?')> i^k, £fc)) ■ 
Then the unbiasedness constraint requires that 

ET(<f>) = \ 2 Q + K{t) (1) 

for any reduced moment function K. Estimating A 2 by N + (N + — 1)/Q 2 yields 

K(t) = \ N + (N + -1) * N + >1 > 
I otherwise 

as a natural estimator of K{i). 

There is an infinite array of functions <j) satisfying (1). Two popular choices are the rigid 
motion correction (Ohser and Stoyan 1981) and the isotropic correction (Ripley 1976). Asymptotic 
results in Sections 4 and 5 suggest that modified versions of the rigid motion correction have good 
large sample properties when the underlying process is Poisson, so we focus on this correction 
here, although we also give some results for the isotropic correction for comparison. It is fairly 
elementary to prove that the rigid motion correction satisfies (1) when the observation domain D 
is a subset of R. First, for a stationary point process M on R with intensity A, define the reduced 
second moment measure K by X 2 K.(ds)dx = 2E{M(dx)M(x + ds)}, in which case, the reduced 
second moment function K is given by K{t) = )C(ds). Denote the indicator function by 1{-}, 
use | A | to indicate the Lebesgue measure of the set A C R and A s to indicate the set A translated 
by the amount s. The rigid motion correction is given by 

v l{|ar-j/|<t}|D| 

(f>(x,y) = 



\DnD x - y \ 
We can then write 

l{x £ D,x + s£ D} 



T(0)=/ f M(dx)M(x + ds) 

Jse[-t,o)u(o,t] Jxm \ u 1 1 u s\ 



<l\ [ M(dx)M(x + ds) 



l{x G D,x + s € D} 



so that 



Jse(p,t] JxeM. 1 \u\MJ s \ 



2 f l^^\x:(ds) 

J sem 2 \DnD s \ V ' 



\ 2 K{t). 



One way to view the setting where D is a collection of line segments is to think of these 
segments as being widely spaced intervals on R, in which case, we just have a special case of the 

treatment in the preceding paragraph. However, it will be helpful in the subsequent development 

p 

to think of D as U {[0, Qj], j}. The rigid motion correction can then be defined by taking <j> to be 
i=i 

uRf,~ us n\\ _ Q+H\x-y\<t,k = £} 



<j> R {(x,k),(y,£)) 



EU(Qi-\ x -v\y 



where 1{-} is an indicator function. To write the isotropic correction in terms of a symmetric 
function, let 

^ ^ X ^ M > - Q + -T,U^{{2\x-y\-Q 3 )^Q 3 } ' (2) 

where a^x^y)-' 1 = l{x+\y-x\ < Q e }+l{x- \y-x\ > 0}. Define K R (t) = Q+T(<j> R )/{N + (N + -l)} 
and K^t) = Q + T(4> 1 ) / {N + {N + - 1)}, where it is understood that K R (t) = #>(t) = for N + < 1. 
We have used Ohser's extension of the isotropic correction to cover the case t > \ min(Qi, . . . , Q p ) 
(Ohser 1983). As Ripley (1988, p. 32) notes, this extension is generally not of much practical value 
when there is a single contiguous observation window. However, when there are multiple windows 
of various sizes, the extension is critical. For the absorber catalog, for example, one is certainly 
interested in estimating K at distances greater than 3.75 h~ l Mpc, the value of \ min(Qi, . . . , Q p ) 
in the catalog. 
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Note that (f) 1 [(x , k) , (y , £)) = (f) R [(x,k),(y,£)) = if A; ^ £, which just says that pairs of 
observations on different segments do not contribute to the estimate of K{t). Since we have made 
no assumption about the joint distribution of Mi, . . . , M p , for (1) to be valid, it is necessary to 
assume 4>((x, k), (y,£)) = whenever k ^ I. Thus, throughout this work, we will only consider (f) 
satisfying 

(A) (f>((x,k),(y,£)) =0 for k±L 

We next show how to apply to the present setting the method developed in Stein (1993) for 
improving upon any estimator of K of the form Q + T{4>) / {N + (N + — 1)} with <f> satisfying (1). Sup- 
pose (X, L) is uniformly distributed on D in the sense that P(L = £) = Qe/Q+ and the density of 
X given L = £ is uniform on [0, Qi\. Then Mi, . . . , M p stationary with common distribution imply 
that for any real-valued function g for which E\g(X,L)\ < oo, E Ylf=i 9{^ji A?) = ^Q+E9{X,L), 
so that Y^j=i{.9{XjiLj) — Eg(X,L)} is an unbiased estimator of 0. The idea in Stein (1993) is to 
choose g to minimize 



var r . 



T(4>) -^{9(^,1,) ^Eg(X,L)} 

j=l 



where var n means to compute the variance under binomial sampling: iV + = n is fixed and, for 
j = 1, . . . , n, (Xj,Lj) are independent and all have the same distribution as (X, L). Proposition 1 
in Stein (1993) shows that for n > 1 and (y,m) G D, a minimizing g is 2(n — l)h(y, m; 4>)/Q+, 
where h(y,m;<j>) = ELi J Q ^(( X ' 0> ™))dx. Under (A), h(y,m;<j)) = f Q Qm <f>((x,m), (y,m))dx. 
Now define 

T*(<t>) = T(cf>) - ^E±Z11 g {h(Xj,Lj;<p) - Eh(X, L; (f>)} . 
Q+ 3=1 

Note that if (f> satisfies (1), Eh(X, L; <ft) = 2t. Under binomial sampling, we always have var n {T* (</>)} 
< var n {T(^)}. This suggests that the estimator K(t) = Q + T*(cf))/{N + (N + - 1)} for N + > 1 and 
otherwise may be preferred over K{t). As with the unmodified estimators, Kn(t) indicates that 
4> = (f> R and Kj(t) indicates that <f> = (j) 1 . 
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Picka (1996) suggests another approach to modifying estimates of second moment measures. 
He considered random sets for which the probability of any fixed point being in the random set is 
positive, but his approach can also be applied to point processes, for which this probability is 0. For 
point processes, his idea corresponds to using an estimator of XQ + other than N + in K. For any real- 
valued function c on D satisfying Y^t=\ Jo** c ( x > ^)^ x = Q+^c = Q+ l Ylf=i c (Xj,Lj) is an unbiased 
estimator of A. Let us consider estimators of K{t) of the form Q + T(4>)/{X C Q + (X C Q + — 1)}. It is 
not generally possible to calculate the exact variance of such estimators under binomial sampling. 
However, for Q + sufficiently large, A c — A and Q~ l T{4>) — X 2 K(t) should be small in probability, 
which suggests using a first-order Taylor series approximation to obtain 

9- r <*> _ ■.^.^ (3) 



A c Q t (A c <2 + - 1) A2Q + A 

For a given <j) and subject to c satisfying the unbiasedness constraint, now consider minimizing 
the variance of the right side of (3) when Mi, . . . ,M p are iid Poisson processes with intensity A. 
It is a straightforward variational problem to show that a minimizing c is given by c(x, £; <j>) = 
h(x,£;<f>)/(2t). Define 



E".V(x j ,i j ;*)(Ea^i,ii;«) - 1} 



for Af + > 1 and K(t) = otherwise. As with K and K, subscripts R or / on K indicate that 



b R or A = 6 1 . 



When Mi, . . . , M p are iid Poisson processes, K(t) and K(t) should behave similarly. To see 
this, first use Taylor series to obtain 



1 2 N+ N 

m * a^q7 t(</,) " W + ^ h{Xl ' Lj ' A) + 2{2t ~ K(t)} xch + 2K(ty 



ii 



From this approximation and (3), when K(t) = 2t, both K and K are approximately 

^0)-^X>U^;0) + 4 t . 

Thus, for Q + large, the two estimators will be similar when Mi, . . . , M p are iid Poisson processes, 
but they are not necessarily similar otherwise. 

Even for simple regions in two or more dimensions, calculating h{-\4>) requires numerical 

v 

integrations. However, when the observation region is D = U {[0, Qj],j}, then it is possible to 

i=i 

give an explicit expression for h(x,£; cf> R ) for (x,£) € D. For convenience, we will assume that the 
QjS have been arranged in increasing order. For r < Q p , define j(r) = mmi<j< p {j : Qj > r} and 

let U(r) = Yj P j=iiQj ~ r ) + - For 3 = 1> • • • >P» let u j = u (Qj) and set Qo = so that U = Q+. 
Furthermore, define 



j(xM)-l ^ . . _ 

J(x ' t)= § ^7TT log v ^J + P -^At) + i lo HT7§7!t 



where a sum whose upper limit is less than its lower limit is defined to be and x A t is the 
minimum of x and t. Then 

Q; 1 /i(x, <f> R ) = k(x, t) + n(Q e - x, t) (4) 

(see the appendix). If the segment lengths are all equal, k(x, t) = p^ 1 log[Q/{Q — (x At)}]. 

It is also possible to evaluate h(x } £;4> r ) explicitly, but the resulting expression is rather 
cumbersome. If t < \ min(Qi, • • • , Q p ), then the denominator in the definition of (p 1 in (2) equals 
Q + whenever \x — y\ < t, which greatly simplifies matters. In this case, it is possible to show that 

h(x,e^) = t + (xAt) + {(Qe-x)At}-^(^At)-^^^Atj . 
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A second special case yielding a simple result is when Qi = ■ ■ ■ = Q p = Q. When t < ^Q, the 
preceding expression for h applies and for t > |Q, 



h(x,l;tf) = ^- + {xA(Q-x)} + Qlog 



{x A(Q-x)} y(Q-t) 



where x V y is the maximum of x and y. 

There is a considerable literature in astrophysical journals on estimating second order char- 
acteristics of galaxy locations based on galaxy surveys in large, contiguous regions of the sky. 
Martinez (1997) and Stoyan and Stoyan (2000) provide two recent reviews of this work. Astro- 
physicists have generally focused on estimating the pair correlation function, which is, after a 
normalization, just the derivative of the K function. For example, for a stationary point process 
M on R, assuming K is differentiable, the pair correlation function is \K' . Similar to K here, 
Landy and Szalay (1993) make use of unbiased estimators of to modify estimators of second order 
characteristics. Moreover, similar to K, Hamilton (1993) describes estimators of the pair correla- 
tion function of the form T(4>)/\ 2 in which A 2 is estimated by something other than the obvious 
estimator. We prefer to estimate K rather than the pair correlation function because it separates 
the problem of handling edge effects from that of density estimation and the consequent smoothing 
problem. If one wants to estimate the pair correlation function, we recommend first computing 
an appropriately edge-corrected estimate of K and then differentiating a smoothed version of this 
estimate. 

4. Asymptotic theory when the truth is Poisson 

There are a number of ways one might take limits to study the properties of the estimators 
proposed in the previous section. One possibility would be to fix p and let the QjS tend to oo. In this 
approach, the fraction of the observation region within a fixed distance of an endpoint of a segment 
tends to and, as in Ripley (1988) and Stein (1993), the variance of all reasonable estimators of 
K(t) for fixed t have the same first-order asymptotic behavior under binomial sampling. However, 
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for the absorber catalog, in which p = 274 and the number of absorbers per line is 1.28, a more 
relevant choice is to uniformly bound the QjS and let p — > oo. This limiting approach keeps the 
fraction of the observation region within a fixed distance of an endpoint of a segment bounded away 
from with the result that the differences between various estimators under binomial sampling 
show up in the leading terms for the asymptotic variance. Hansen, Gill and Baddeley (1996) and 
Baddeley and Gill (1997) take a similar asymptotic approach for studying estimators of properties 
of spatial point processes based on observing the process in an increasing number of identical and 
distantly spaced windows. 

We now consider adapting the asymptotic results in Ripley (1988) and Stein (1993) to the 
present setting. First, we give exact expressions for the variance under binomial sampling of both 
K(t) and K{t). Following Ripley (1988), for a symmetric function <j> on D x D satisfying (A), 
define 

P rQj rQj 
S{<t>) = l^ / (f>{(x,j),(y,j))dxdy, 
j=1 Jo Jo 

P r Qj ( rQj i 2 

SM = J2j o {j Q <P{(x,j),(yJ))dxj dy, 

and 

£2(0) = V/ / (j)((x,j),(y,j)) dxdy. 
j=1 Jo Jo 

Under (A) (Ripley 1988), 

rm/ , x , 2n(n - 1) ( „ , 2n — 4„.,. 2n-2> ri/l . 2 \ ,_. 

var n {T(0)} = V Q2+ \ S ^) + ~Q^ Sl ^ ~ -Q2- S ^) / ( 5 ) 

and (Stein 1993) 

var n {T*(0)} = 2n{n Q ~ l) \s 2 {® - ±-SM + ^ + S{4>?} ■ (6) 

We now want to study what happens as p — > 00. Suppose Q\, Q2, ■ ■ ■ is a sequence of positive 
numbers and the subscript p is used to indicate the dependence of a term on the number of 
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segments observed, so that D p = U {[0, Qj],j}, Q+ P = Y^j=iQj an d N +p is the total number of 

j=i J 

events on D p . Suppose {<f) p } is a sequence of functions for which the domain of 4> p is D p x D p and 
4> p is symmetric for all p. In addition to <p p satisfying (A) for all p, we will assume the following 
regularity conditions: 

(B) The (j) p s are uniformly bounded; 

(C) For each p, <j) p satisfies the unbiasedness constraint in (1); 

(D) The QjS are bounded away from and oo. 

Under (A)-(D), we have S((j) p ) = 2tQ +p = 0(p), S^cpp) = 0{p) and S 2 (<p p ) = 0(p) but is not o{p). 
It follows that as p —>■ oo, 



Comparing (6) and (7) suggests that minimizing S 2 (^ p ) subject to (A)-(D) is nearly the same as 
minimizing var n {T*(^ p )}. Stein (1993) shows that subject to (C), the rigid motion correction gives 
a minimizer of S^f^p). The appendix gives an explicit expression for S2(4* R ) in terms of elementary 
functions. 

We next obtain an analog to Proposition 2 in Stein (1993), which demonstrates the asymp- 
totic optimality under the Poisson model for Kr among a certain class of estimators as the dimen- 
sions of a single observation window increase. For a sequence of functions {cf) p } on D p x D p and a 
sequence of functions {g p } on D p x {0, 1, . . .}, define the statistic ®(4> p ,g p ) by 



if N +p > 1 and otherwise. Write E\ to indicate expectations assuming M±,M2,... are inde- 
pendent Poisson processes with constant intensity A independent of p. All ensuing asymptotic 
results in the rest of this section involve expectations over the Poisson model and can be proven by 
first conditioning on N +p , using the fact that under this model, the conditional distribution of the 



S2^ P ) - ^-Si(^) + t^(^) 2 = 5 2 (^){1 + Oip- 1 )}. 



(7) 



N +p (N +p - 1) 
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observed events on D p follows binomial sampling, and finally, by averaging over the distribution 
of N +p , which follows a Poisson distribution with mean \Q+ p . 

Proposition 1. Suppose {4> p } satisfies (A)-(C), E\< ^ \g p ((Xj, Lj), N +p )\ > < oo for all 
p, the QjS satisfy (D) and p~ l YliQj ~ *) + 1S bounded away from as p — > oo. Then 

p 2 E\ [k R (t) - 2t} 2 - E x {9(<f> p ,g p ) - 2t} 2 

is bounded from above as p — > oo. □ 

The assumption that p~ x J2(Qj ~ + i s bounded away from as p — > oo guarantees that {<ft p } 
satisfies (B). Since, under the conditions of Proposition 1, E\{Kn(t) — 2t] 2 = 0(p~ 1 ) as p — > oo, 
this result says that when the underlying processes are independent Poisson with equal intensity, 
Kf> asymptotically minimizes the mean squared error among all sequences of estimators of the 
form considered in the proposition. 

Let us now make some comparisons of the asymptotic mean squared errors of some esti- 
mators of K(t) under the Poisson model when all QjS equal Q and s = t/Q. From (6), we get 
E X [k(t) - 2t) ~ \ipZQt Si{<t>p) ■ Thus, (17) in the appendix implies 

E x [k R (t)-2t} 2 ~-J-io g (l- s ) (8) 

and (20) in the appendix implies 

(s+ls 2 ifO< S <i, 
E x lK I (t)-2t\ ~T^x^ + i s + |s 2 ifi<s<iand (9) 

XP lg-Iog2-log(l- a ) if | < a < 1- 

From Proposition 1, the right side of (9) must be at least as large as the right side of (8) for all 
s G (0,1). In fact, it is a straightforward exercise to show analytically that the right side of (9) 
is strictly greater than the right side of (8) for all s G (0, 1). Thus, as p — » oo, the modified rigid 
motion estimator Kr performs nonnegligibly better than either the ordinary or modified isotropic 
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estimator for any t G (0, Q) under the Poisson model, although the improvement over the modified 
isotropic estimator is minor. Figure 1 shows the ratio of the asymptotic variances for Ki(t) and 
Kji(t) under the Poisson model, which reaches a maximum of approximately 1.032 near t = 0.247Q. 
The asymptotic results in (8) and (9) are unchanged if Kr and Kj replace Kr and Kj. 

We next compare the modified and unmodified rigid motion estimators as p — » oo when all 
QjS equal Q. From (5), 

^ {*<«> - 4 2 ~ w w > + w Si m - w 

Using (17) and (18) in the appendix then yields 

E x [K R {t) - 2t} 2 ~ A [- log(l -s) + 4AQ{ 7 (s) - s 2 }] , (10) 

where 

1 /-i T r 1 l{\x-y\ < s} , I 2 , 

dar. (11) 



Equation (19) in the appendix gives a more explicit expression for 7. Note that 



2 1 Z" 1 r f 1 H\x-y\ <s\ , 



2 



which is strictly positive for all s G (0, 1]. 

Comparing (8) and (10) shows that, in terms of mean squared error, the asymptotic relative 
advantage of either modified rigid motion estimator over the unmodified rigid motion estimator 
is proportional to \Q, the expected number of events per segment. Figure 2 plots 4{7(s) — 
s 2 }/{-log(l - s)}, which is less than 0.124 for all s G (0, 1) and is less than 0.061 for all s < 0.9. 
Thus, at least for equal QjS, we should not expect a large improvement under the Poisson model 
due to the modifications when there are only 1.28 events per segment as in the absorber catalog. 
Simulation results in Section 6 show that larger improvements can occur with unequal QjS. 
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5. Some asymptotic theory for non-Poisson processes 

There is a decided lack of asymptotic theory that permits useful comparisons of estimators of 
K when the underlying process is not Poisson. Stein (1995) derives results showing the advantage 
of estimators like K over those like K, but the asymptotic approach taken there requires that 
the distance t at which one is estimating K be large compared to the distances at which the 
underlying process shows nontrivial dependence. When the observation window is made up of 
many segments, especially if the Qjs are equal and the Mjs are independent, it appears feasible 
to develop some useful asymptotic results for non-Poisson processes. This section describes some 
general asymptotic results for the estimators K, K and K described in Section 3. These results 
are used to demonstrate that if M\,M2,... are, conditional on Ai,A2,..., independent Poisson 
processes with Mj having intensity Aj, where the AjS are iid positive random variables, then as 
p — > oo, Kji(t) is superior to Kr(£), which is in turn superior to Kn(t). 

Suppose Mi,M2,... are iid simple, stationary point processes on R with intensity A and 
reduced second moment function K. Assume Q = Q\ = Qi = • • • and let Xij, . . . , X^j be 
the locations of the Nj events from Mj on (0,(5). For a bounded, symmetric function <f> on 
(0,Q) x (0, Q), define $j = Ylk^e^i-^-kji^ej)- Analogous to (1), suppose E<bj = \ 2 QK(t) for any 
reduced second moment function K for the Mjs. Define Gj = (2t) _1 Ylk=i Jo^ ( / ) (Xkj,y)dy, so that 
EGj = \Q. Using these definitions, the estimators described in Section 3 are given by 



At V p G • 
K{t) = K{t) - 3 + At 

and 



K{t) 



E?=iGi(E?=iGi-l)' 
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Furthermore, since {Nj, &j,Gj}° ( L 1 is an iid trivariate sequence, we can readily derive the limiting 
distribution of these estimators. Specifically, if E(Nf) < oo, then &i and G\ have finite second 
moments, so as p — ► oo, 

p 1/2 I T!j=i *i - A 2 Qi^(t) ^iv(o,s), 

where — > indicates convergence in distribution and S is the 3x3 covariance matrix of (N\, <&i,Gi). 
Using first-order Taylor series, we get XQp^ 2 {K(t) - K(t)} ^ N(0,V), \Qp l/2 {K{t) - K(t)} ^ 
N(0, V) and \Qp l l 2 {K{t) - K{t)} ^ N(0, V), where 

V = 4if(i) 2 var(iVi) + -1 var(*i) - cov(7Vi, $1), (12) 



V = A{K{t) - 2t} 2 var(iV 1 ) + ^ var ($i) + 16t 2 var(Gi) _ iffi^ — ^ cov(JVi, $1) 

A A 

- — cov($i,Gi) + !6{K(t) - 2*}cov(JVi,Gi) 
A 



(13) 



and 

A 2 A 



F = 4K(i) 2 var(Gi) + var($i) - cov($i, d). (14) 



As expected, V = F when if(i) = 2t. 

To calculate the limiting behavior of these estimators for any given 4>, Q and law of Mi, we 
only have to compute the covariance matrix S and plug the results into (12)-(14). In some limited 
cases this computation can be done analytically or more often by numerical integration; otherwise, 
£ is easily approximated by simulation whenever Mi can be readily simulated. 

We now consider a simple setting in which S can be explicitly derived. Suppose Mi, Ma, . . . 
are, conditional on Ai, A2, . . ., independent Poisson processes with Mj having intensity Aj, where 
the AjS are iid positive random variables. Such a model could serve as an approximation for a Cox 
process (Daley and Vere-Jones 1988, Section 8.5) observed over widely spaced segments where the 
random intensity function A(-) of the process has little variation over distances of length Q but 
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the segments are sufficiently spaced so that the behavior of A(-) in different segments is essentially 
independent. 

Next, suppose <j>(x, y) = Ql{\x — y\ <t}/(Q—\x — y\), so that we are using the rigid motion 
estimator. In this case, the elements of S can be readily calculated in terms of the moments of Ai. 
Writing rrij for E(A J 1 ), we have X = mi, K(t) = 2tm 2 /m 2 , 

\ax(Ni) = Qmi + Q 2 (m 2 —m\), 

var($i) = 16Q 3 7 (J^J m 3 - 4Q 2 log ^1 - m 2 + 4i 2 Q 2 (m 4 - ml), 
var( 



and 



r («i) = ^7 (^) m 1+ Q 2 (m 2 -m 2 ), 
cov(iVi, $1) = 4tQm 2 + 2tQ 2 (m 3 - m 1 m 2 ), 
cav(Ni,Gi) = Qmi + Q 2 (m 2 - m\) 

cov($i, Gi) = —j-j I — J m 2 + 2tQ 2 (m 3 - m\m 2 ). 



Each of these results can be obtained by conditioning on Ai. For example, 

var($i) = £{var($i | Ai)} + var{£($i | Ai)} 

fQ r rQ ~) 2 rQ rQ 



E 



rQ ( rQ rQ rQ 

4A? / { ct>{x,y)dy\ dx + 2k\ / / ^x,y) 2 dxdy 
Jo Uo ) Jo Jo 



+ var(2£A 2 <2) 



= 16Q d 7 ^- J m 3 - 4Q 2 log ^1 - - J m 2 + 4t 2 <2 2 (m 4 - m 2 

where the second step follows from (10) in Ripley (1988, p. 30) and the last step uses (17) and (18) 
in the appendix. 

Plugging these results into (12)-(14) yields 

Vr = — j var($i) - 16t 2 <3^ + 16t 2 Q 2 V 2 , 
mf m\ m\ 

~ 1 . . 9^(m 2 — m?) 2 o /i\ / 2m 2 \ 9^9m 2 (mn — m\m%) 

V R = — var (*x - 16t V i; - 16Q 3 7 U- --mi + 16t 2 Q 2 iV 2 4 1 37 

mj mf \ Q / \ mi / m* 
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and 

where the subscript R indicates that the asymptotic variance is for the appropriate version of the 
rigid motion estimator. Thus, 

which is positive on (0, 1) since 7(5) — s 2 > for s € (0, 1) and 777,2 > fn\- Furthermore, 



which is positive on (0,1) whenever 7772 > m 2 . Thus, Vr > Vr unless varAi = 0, in which case, 
7772 = rn\ and Vr = Vr. 

The arguments in this section largely carry over to estimators for the reduced second moment 

P 

function of iid point processes on R d observed over U {A, j} for some A C R d . In particular, (12)- 
(14) still hold if, at the appropriate places, 2t is replaced by /J,<it d , the volume of a ball of radius 
t in R d . Furthermore, the comparisons between Vr, Vr and Vr in (15) and (16) still hold after 
replacing >y(t/Q) - t 2 /Q 2 by j A { j A <p(x, y)dy - ^i d t d } 2 dx. 

6. Simulation study 

The asymptotic results in the preceding two sections provide only limited information about 
the relative advantages of the various estimators, especially for non-Poisson processes or unequal 
QjS. Because the estimators Kr, Kr and Kr can all be explicitly calculated, it is fairly straight- 
forward to study the behavior of these estimators via simulation. This section reports some results 
from a simulation study that considers equal and unequal QjS and three models for the law of the 
point processes. For the unequal segment length case, p = 50 and Qj = O.lj for j = 1, . . . ,p and 
for the equal segment length case, p = 50 and each Qj = 2.55, so that Q + = 127.5 in both cases. 
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The three processes reported on here are all stationary renewal processes; that is, the waiting times 
between consecutive events are hd random variables. In each case, the intensity of the process is 
1, so that EN + = 127.5 in all simulations. Stationary renewal processes are straightforward to 
simulate on an interval [0, Q\. If F is the cdf (cumulative distribution function) for the waiting 
times and /i < oo is the mean waiting time, then to obtain a stationary process on [0, oo), use 
fi^ 1 Jq{1 — F{y)}dy for the cdf of the time of the first event after (Daley and Vere- Jones 1988, 
p. 107). Simulate a random variable from this distribution; if it is greater than Q then one is done 
and there are no events in [0, Q] for this realization of the process. If not, simulate random waiting 
times with cdf F until one gets the first event after Q and use the preceding events as the realization 
of the process on [0, Q]. Here, we consider waiting time densities / that are exponential with mean 
1 (in which case the Mjs are Poisson processes), f(x) = 4xe~ 2x for x > (a gamma density with 
parameters 2 and \) and f(x) = 24/ (2+x) 4 for x > 0. Figure 3 plots K(t) — 2t for renewal processes 
with the last two waiting time densities, which shows that the first of these corresponds to a process 
more regular than the Poisson and the second is more clumped than the Poisson. For the gamma 
waiting times, it is possible to show that for x ^ 0, P{Mi(dx) = 1 | Mi({0}) = 1} = 1 — e~ 4x and 
hence that K(t) = 2t — \(\ — e -4 *). For the third waiting time density, we cannot give an analytic 
expression for K(t), although Theorem 1 in Feller (1971, p. 366) implies that K(t) — 2t — > 2 as 
t — > oo. The values for K(t) in Figure 3 for this process were obtained by simulation. Since the 
mean waiting times are all equal, the variances of the waiting times provide another measure of 
dumpiness with larger variances corresponding to a clumpier process. For the exponential waiting 
times, the variance is 1, for the gamma case, the variance is \ and for the last case the variance is 
3. 

Figures 4-6 show the results of simulations for both sets of segment lengths and all three 
processes. For each scenario, the three estimators were calculated at a range of distances for 10,000 
simulations. Generally speaking, Kr and Kr behave similarly and are superior to Kr, especially 
at longer distances when the QjS are unequal. Figure 4 shows the mean squared errors for Kr. 
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In all cases, the contributions of the squared biases to the mean squared errors are practically 
negligible and are always less than 0.5%. As expected, the mean squared errors grow with t, 
especially for the unequal segment length case as t gets near 5, the longest segment length available. 
Another expected result is that the mean squared errors increase with increasing dumpiness of the 
underlying process. Figure 5 compares Kr and Kr. We see that Kr is generally superior, although 
Kr is sometimes slightly better for smaller t. The relative advantage of Kr (and Kr) over Kr 
tends to be greater for more regular processes, which qualitatively agrees with the asymptotic 
results in Stein (1995). The advantage also tends to be greater for unequal segment lengths, 
demonstrating that theoretical results obtained for equal segment lengths may not accurately 
reflect the differences between estimators when segment lengths are unequal. Figure 6 compares 
Kr and Kr. From the theoretical results in the previous section, we should expect these estimators 
to behave similarly when the waiting time density is exponential so that the underlying model is 
Poisson. The simulations show that the estimators also tend to behave very similarly for some 
non-Poisson models, especially when the segment lengths are equal. Neither estimator dominates 
the other, although K tends to be slightly superior for t nearly as large as the longest segment 
length. 

For highly regular processes, Kr can be substantially inferior to either Kr or Kr for t 
sufficiently small. The problem is caused by the fact that in such circumstances, having a pair 
of events within t of each other is rare, so that var{T ((/))} is much smaller than under a Poisson 
model with the same intensity, whereas the variance of 

T(<f>) - T*(0) = 2{N+ ~ l) X>(*i> Lf, <f>) - Eh(X, L; </>)} 

is not much different for a highly regular process than for a Poisson process. As a consequence, 
subtracting off T(<j>) — T*(<j>) from T{(j)) tends to inflate the variance of the estimator. As an 
example of a highly regular process, consider the stationary renewal process with waiting time 
density |j-x 5 e~ 

-x/6 for x > o 

a gamma density with parameters 6 and g. This waiting time 
23 



distribution has mean 1 and variance | and corresponds to a highly regular point process. It is 
possible to show that 

K{t) = 2t - I + \e~™ + \ cos(3 3 / 2 t)(e- 9 * + e" 3 ') + ^ sin(3 3 / 2 t) Q e "* + e" 3t ) 

for this process. Figure 7 shows that Kr is notably inferior to either Kr and for t sufficiently 
small; for larger t, it is competitive with Kr and clearly superior to Kr. The overall winner is 
Kr, which performs well for all t. 

We are unaware of any circumstances in which Kr performs substantially worse than either 
Kr or Kr. Thus, we recommend routinely using Kr to estimate K, although routine adoption 
for processes in more than one dimension will require the development of the necessary software. 

7. Application to absorber catalog 

Figure 8 displays the estimators Kr, Kr and Kr as applied to the absorber catalog described 
in Section 2. The three estimators are very similar and, as expected, show clear evidence of clus- 
tering of absorbers. To obtain some idea about the uncertainty of these estimates, as in Quashnock 
and Stein (1999), approximate 95% pointwise confidence intervals were obtained by bootstrapping 
using the 274 segments as the sampling units. Specifically, using the notation in Section 5, simu- 
lated absorber catalogs were produced by sampling with replacement from (Qj; X\j, . . . , X^j) for 
j = 1, . . . , 274, so that when one selects a segment, one automatically selects the absorber locations 
that go with this segment. The confidence bands displayed in Figure 8 are then what Davison and 
Hinkley (1997, p. 29) call the basic bootstrap confidence limits and are based on 999 simulated 
catalogs. All three estimators yield similar confidence intervals, which is disappointing but per- 
haps not unexpected given the strong clustering that exists in the absorber catalog and the finding 
in the simulation study that the advantage of the modifications decreases as clustering increases. 
For these bootstrapping intervals to be appropriate, {Qj\ X\j, . . . , X^jj) for j = 1, . . . , 274 should 
be iid random objects. Since the segments are of widely varying lengths, if the QjS are viewed 
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as fixed, the identically distributed assumption is false. However, if we view the QjS as being 
a sequence of iid positive random variables that are independent of the locations of absorbers, 
then the identically distributed assumption may be reasonable. Whether or not the independence 
assumption is reasonable depends on the spatial extent of clustering among absorbers. If there 
is no spatial dependence in absorber locations beyond, say, 100 h~ l Mpc, then the independence 
assumption is not seriously in error, since few pairs of segments are within this distance of each 
other. If, however, nonnegligible clustering exists well beyond 100 h^ 1 Mpc, then the independence 
assumption is more problematic. 

Analyses of galaxy surveys (Davis and Peebles 1983, Loveday, et al. 1995) show that visible 
matter clusters on scales of up to 20 h~ x Mpc. Thus, it is more interesting to investigate how 
K{t) — 2t changes at distances beyond 20 h^ 1 Mpc than to look at K itself. Figure 8 shows that 
Kji(t) — 2t generally increases until about 200 h~ x Mpc and it is important to assess the uncertainty 
in this pattern. Applying the bootstrapping procedure to Kn(t) — Kr^q) for to = 20,50, 100 and 
150 h~ l Mpc, Quashnock and Stein (1999) concluded that there was strong evidence for clustering 
from 20 to 50 hr x Mpc and from 50 to 100 hr x Mpc, but at best marginal evidence for clustering 
beyond 100 h^ 1 Mpc. The results with the modified estimates (not shown) confirm the clear 
evidence for clustering from 20 to 50 h~ x Mpc and from 50 to 100 hr 1 Mpc. Figure 9 shows the 
lower bounds for pointwise 95% confidence intervals for K{t) — if(100) — 2(i — 100). The modified 
estimators yield slightly stronger evidence of clustering beyond 100 hr 1 Mpc, which is mostly due 
to the fact that the modified estimates of Kit) — If (100) — 2(t — 100) are slightly larger than the 
unmodified estimates for t around 200 and not because the modified intervals are narrower. If one 
used 99% pointwise confidence intervals in Figure 9, then for all t > 100 and all three estimators, 
the lower confidence bounds are negative. Thus, the conclusion in Quashnock and Stein (1999) 
that there is perhaps marginal evidence for clustering beyond 100 h^ 1 Mpc is not altered by using 
the modified estimators. 
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As discussed in Section 2, the broad range of redshifts in the absorber catalog implies that we 
are looking at the universe at a broad range of times. The use of comoving units largely equalizes 
the intensity of absorbers across redshifts, but it does not equalize the clustering. Indeed, by 
dividing the absorber catalog into groups based on their redshift, Quashnock and Vanden Berk 
(1998) found evidence that as redshift decreases, clustering on the scales of 1 to 16 h~ x Mpc 
strongly increases across the range of redshifts in the absorber catalog. Quashnock and Vanden 
Berk (1998) further note that this increase in clustering with decreasing redshift is consistent with 
what is known through theory and simulations about how gravity should affect the evolution of 
the clustering of absorbers over time. Using the various forms of the rigid motion estimator of K 
described here on groups of the absorber catalog with similar redshifts, we also find that on the 
scale of a few tens of h~ x Mpc, clustering increases substantially with decreasing redshift over the 
range of redshifts in the absorber catalog (results not shown). Thus, on these shorter scales, our 
estimates of K measure an average clustering over the range of redshifts in the absorber catalog. 

In contrast, Quashnock, Vanden Berk and York (1996) found no evidence that clustering at 
scales of 100 h~ l Mpc changes over the redshift range in the absorber catalog. Similarly, when 
looking at, say, Kn{t) — Kr(100) for t > 100 based on higher and lower redshift parts of the catalog, 
we find no systematic difference in the estimates as a function of redshift. For example, dividing the 
274 segments in the catalog into two groups of size 137 based on redshift, -K#(150)— Kr(100) equals 
150.8 for the lower redshift group and 151.4 for the higher redshift group. Thus, we do not believe 
that the modest evidence we find for clustering at these larger scales is due to inhomogeneities 
across time in the distribution of absorbers. 

8. Summary 

For studying the behavior of edge-corrected estimators of the K function of a point process, 
taking the observation domain to be a sequence of segments has a number of desirable consequences. 
First, explicit expressions are available for a number of the more popular estimators, which is often 
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not the case for regions in more than one dimension. The availability of such explicit expressions 
eases the study of the properties of these estimators via both theory and simulation. In addition, 
studying settings in which the number of segments is large yields results that highlight the differ- 
ences between the various methods of edge-correction. In particular, simulation results show that 
allowing the segment lengths to vary generally increases the differences between estimators. The 
overall conclusion about the merits of the various estimators is that Kr, a modification of the rigid 
motion estimator based on an approach suggested by Picka (1996), is the estimator of choice. 

The absorber catalog studied here shows that multiple windows of varying size can arise in 
practice. Although it is somewhat disappointing that the bootstrap confidence intervals for the 
ordinary rigid motion corrected estimator and its modifications are very similar, this result is not 
too surprising in light of the simulation results showing that the benefit of the modifications is 
smaller for clustered processes. The simulation results indicate that the modified estimators can 
have substantially smaller mean squared errors for Poisson or more regular processes, especially if 
the segment lengths vary substantially. 

Appendix. Proofs 

We first derive (4) assuming, for convenience, the QjS have been arranged in increasing 
order. We have 




/ 

Jo 



Qe l{\x-y\<t} 
U(\x-y\) 



dy 
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Thus, to verify (4), we need to show that 



Jo U(\x-y\) 



Now 



x l{\x-y\ <t} f x dy 

ay - 



o 



U{\x-y\) J(x-t)+U{x-y) 

dy 



j(xAt)-l rx - Qk _ 1 



E 



k=l UX ~^k ^3~- 

X ~Qj(xAt)-l 



+ 



( X -t)+ E p j=j(xAt) (Q 3 -x + yy 



which equals k(x, t) by calculus. 

We next derive S2(4> R ), again assuming the QjS have been arranged in increasing order. By 
the symmetry of (f) R , 

S 2 (j> R ) = 2Ql £ [ Qj f 1{ "- y -' } dydx, 
J^Jo Jo U{x-y) 2 

so taking v = x — y and then switching the order of integration yields 



IQl j^Jo Jo U(v) 



U{vf 



dv 



v j'A{j(t)-l} q v ft n> 

sr J2 * dv+ E / 2 dv 

E{iO } j — Qt _ Qj — Qe-i 1 , ( Ui 
1 ( n - f-L- 1\TT„ ~ (r> - f J- 1 \TT« ', ~ (n — P4-1\2 ° g 



:r- 



{ ^ \(p-t + i)u e (p-e + i)u e -! (p-e + i) 2 b V^-i 



+ E 



" Qrzl 9i ~ Q m-i i lo J u® 

{ P -j(t) + i}u(t) {p-j(t) + i}u j(tyi {p-m + iy *\u j{tyi 
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Using the definition of U(t), the second sum simplifies to {p — j(t) + 1} 1 log { C/j- r t j_i/?7(t)} and 
by switching the order of summation and using the definition of Ui, the first sum equals 



Thus, 



Qj - Qe 



Qj - Qe-i , 1 
+ 



(p-£ + l)U t (p-i + l)U t -i ' ( P -£ + i) 2log \ u, 



E 



+ 



1 



€ - Up-^ + i)^ (p-e + yue..! p-e + i 
j(t)-i 



log 



^-1 



E ^TT log 



s 2 (<P 



R\ 



J(t)-1 
2# E 



1 



p-£ + l 



log 



If Qi = ■ • • = Qp — Q, then for t < Q, j(t) = 1, so 



S 2 (^) = -2pQ 2 log(l-£). 



(17) 



Calculating S'i((/> R ) is more difficult and we only give the special case Qi = ■ ■ ■ = Q p = Q- 
Setting s = t/Q, we then have 



where 7 is defined in (11). To evaluate 7, write 



'' l{.r // < s) 12 



1 

+ 2 



■ dy 

1 — x + y 

1 r rx l{x-y< s} 
l-x + y 



dx 



dy 



1 l{z - x < s} 
1 — Z + X 



dz 



dx 



If 1 If 1 

- / log 2 {l - (x A s)} dx + - / log{l — (x A s)} log{(l - s)Vi} dx. 
^ Jo ^ Jo 



(18) 



Now 



/ log 2 {l - (x A s)}dx = 2s + 2(1 - s) log(l - s) 
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and for s < i, 

[ log{l - (a; A s)}log{(l - s) V x}dx = -log 2 (l - s) - 2slog(l - s) 
Jo 

whereas for s > ^, 

/ log{l — (x A s)} log{(l -s)Vi} dx 
Jo 

= -2(1 - s) log(l - s) - 2slogslog(l - s) + / log(l - y) logy dy. 



Hence, 



,(,) , + (J - 2,)- logfl ,) ! is< I|Ilog 2 (l- S ) 

-!<{«> i| slogslog(l - s) + jf log Q - y^j log Q + y) dy. 



(19) 



Let us next consider computing S^^ 1 ). Denning R(v) = Q+ — Y7j=i{^ v ~ Qj) + > then f° r 
y < x < Qi we have 

l{x — y < t,£ = m}Q_f 



^((x,£),(y,m)) 



+ 



R(x-y) [1 + l{2x -y <Q e } 1 + l{2y - x > 0}_ ' 



Thus, taking v = x — y, 

v 



2Q2 




o ^(s - yf 



+ 



P /-Q^ /-xAt ^ 



£=1 



JO 



1 + l{2x - j/ < Q*} l + l{2y-x>0} 

2 



dy cte 



+ 



£=1 



l + l{x + t;<Q4 l + l{z>2t;} 

1 1 

+ 



dv dx 



R(v) 2 J v [l + l{x + v <Q e } l + l{x>2v}_ 



dx dv. 



Now [l + l{x+v < Qe}] 1 + [l + l{x > 2v}] 1 takes on values 2, | and 1 depending on, respectively, 
whether none, one or both of x + v < Qi and x > 2v are true. Thus, 

S 2 (4> 1 ) = j2 { f tAlQl pv + l(Qe-3v) dv 



2Q2 



R{v) 



30 



+ 



t/\\Q t 9 



th\Q t 



R(vf 



4 (2Q e - 4v) + 4(3 v - Q e ) ^ + ^ A(Q e - y) ^ 



tA\Q t Q £ + 3 ? 



i?(v) 



i?(u) 



dv + 



/At 



dv 



While it is possible to evaluate these integrals explicitly, the resulting expressions do not 
appear to simplify as in the case for the rigid motion estimator. When Qi = • • • = Q p = Q, we do 
obtain a fairly simple explicit result. By taking u = v/Q, we get 



2 P Q 2 



L 



+ 



sA4 



1 + fu 



{1 - (2m- 1)+} 2 
4 - 4u 



sA 



i {l-(2u-l)+} : 



du + 



JsAk {I 3 



i + 3u 



(2u-l)+} ; 



du 



2 P Q 2 



f SA 3 3 /' SA 2 /l \ f S 

/ (l + -u)du+ / - + 3u du+ / 

J ^ ./sA± / JsA 



1 1-U 



du 



so that for s = t/Q < 1, 



s + f s 2 



5 2 (^) = 2pQ 2 x i + |s + 



1 „ i 3 e 2 



if < s < k, 



12 ~~ 2 — 2 3 ^— 

g-log2-log(l- S ) if | < 



if | < s < i and 



< 1. 



(20) 
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Figure 1: Ratio of asymptotic mean squared error (amse) of Kj(t) to that of Kn(t) 
p segments of length Q as p — > oo under Poisson model. 




35 



Figure 2: Plot of 4{7(s) — s 2 }/{-log(l - s)}. Multiplying this ratio by XQ gives the 
relative increase in asymptotic mean squared error as p — ► oo due to using Ku{t) rather 
than KR{t) for p segments of length Q under the Poisson model with s = t/Q. 
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Figure 3: Plots of K(t)—2t for the renewal processes with waiting time densities 4a;e 2x 
for x > (solid line) and 24/(2 + a;) 4 for x > (dashed line). 



K(t) - 2t 
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Figure 4: Mean squared errors for Kr for three renewal processes and unequal or equal 
segment lengths. The waiting time densities are: 

e~ x for x > (dotted line), 

4xe~ 2x for x > (solid line) and 

24/(2 + x) A for x > (dashed line). 
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Figure 5: Relative differences in mean squared errors of Kr and Kr\ mse(iffi)/mse(iffl) — 
1. Line types have same meaning as in Figure 4. 
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Figure 6: Relative differences in mean squared errors of Kr and Kr\ mse(-ftjj) /mse(Kn) — 
1. Line types have same meaning as in Figure 4. 
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Figure 7: Relative differences in mean squared errors of Kr and Kr (solid line) and 
of Kr and Kr (dashed line) for renewal process with waiting times having density 
^x b e~ x l & for x > and for equal and unequal segment lengths. Where the dashed 
line is not visible, it coincides with the solid line. 
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Figure 9: Lower limits of 95% confidence intervals for K(t) — K(100) — 2(t — 100) for 
the absorber catalog. 
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